AITopics | cumulative reward

Collaborating Authors

cumulative reward

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Combinatorial Multi-Armed Bandit with General Reward Functions

Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu

Neural Information Processing SystemsApr-21-2026, 15:27:42 GMT

In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the max() function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve O(log T) distribution-dependent regret and O( T) distribution-independent regret, where T is the time horizon. We apply our results to the K-MAX problem and expected utility maximization problems. In particular, for K-MAX, we provide the first polynomial-time approximation scheme (PTAS) for its offline problem, and give the first O( T) bound on the (1)-approximation regret of its online problem, for any > 0.

artificial intelligence, data mining, machine learning, (21 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Game Theory (0.71)

Add feedback

e4343147340c9d65f4c780451eb066f9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 11:32:34 GMT

algorithm, experiment, graph, (16 more...)

Neural Information Processing Systems

Country:

Asia (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education > Educational Setting > Online (0.67)
Information Technology (0.67)

Technology:

Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Appendix A Implementation Details

Neural Information Processing SystemsFeb-16-2026, 20:31:39 GMT

A.1 More Information About The Continuous Environment We provide a detailed description of the continuous environments with constrained settings: Let's consider an optimization problem in the form of: minimize α After analyzing Table C.1 and Figure C.1, it is evident that the B2CL, MEICRL, and InfoGAIL-ICRL Although MMICRL-LD shows a notable improvement, its performance remains mediocre in environments involving three types of agents. Table C.2 presents the mean std results of all algorithms in Mujoco. Figure C.2 depicts the distribution of x-coordinate values Half-Cheetah, Blocked Swimmer, and Blocked Walker environments. It demonstrates the algorithm's capacity to infer and restore incorrect We employ "/" to separate the results for various We present the mean std results calculated over 20 runs for each random seed.Method Setting 1 Setting 2 Setting 3 Setting 4 Feasible Cumulative Rewards B2CL 0.24 0 .40 Figure C.1: The feasible cumulative rewards (left two columns of the first three rows and second-to-last row) and constraint violation rate (right two columns of the first three rows and last row). The first row showcases the expert demonstration, followed by the results of B2CL, MEICRL, InfoGAIL-ICRL, MMICRL-LD, and MMICRL algorithms.

agent type, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

a9ea92ef18aae17627d133534209e640-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 10:33:00 GMT

algorithm, artificial intelligence, equilibrium, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.93)

Add feedback

Appendix No-regret Algorithms for Fair Resource Allocation

Neural Information Processing SystemsFeb-16-2026, 00:19:59 GMT

Wang et al. [ 2022 ] considered an online resource allocation problem where the

allocation, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No-regret Algorithms for Fair Resource Allocation

Neural Information Processing SystemsFeb-16-2026, 00:19:55 GMT

Suppose a revenue-maximizing recommendation algorithm concludes from past data that more revenue is generated by showing the ad to Group A compared to Group B. In that case, the ad-serving algorithm will eventually end up showing that ad exclusively to Group A

artificial intelligence, data mining, machine learning, (19 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(4 more...)

Industry: Education (0.47)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.93)
(2 more...)

Add feedback

Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities

Neural Information Processing SystemsFeb-14-2026, 09:21:31 GMT

In fact, the interaction of these two aspects requires addressing the fact that each agent's own safety constraint requires information from all others.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country: